<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="UTF-8">
	<meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
	<title>Text analysis overview | ElasticSearch 7.7 权威指南中文版</title>
	<meta name="keywords" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <meta name="description" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <!-- Give IE8 a fighting chance -->
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
	<link rel="stylesheet" type="text/css" href="../static/styles.css" />
	<script>
	var _link = 'analysis-overview.html';
    </script>
</head>
<body>
<div class="main-container">
    <section id="content">
        <div class="content-wrapper">
            <section id="guide" lang="zh_cn">
                <div class="container">
                    <div class="row">
                        <div class="col-xs-12 col-sm-8 col-md-8 guide-section">
                            <div style="color:gray; word-break: break-all; font-size:12px;">原英文版地址: <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-overview.html" rel="nofollow" target="_blank">https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-overview.html</a>, 原文档版权归 www.elastic.co 所有<br/>本地英文版地址: <a href="../en/analysis-overview.html" rel="nofollow" target="_blank">../en/analysis-overview.html</a></div>
                        <!-- start body -->
                  <div class="page_header">
<strong>重要</strong>: 此版本不会发布额外的bug修复或文档更新。最新信息请参考 <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html" rel="nofollow">当前版本文档</a>。
</div>
<div id="content">
<div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.html">Elasticsearch Guide [7.7]</a></span>
»
<span class="breadcrumb-link"><a href="analysis.html">Text analysis</a></span>
»
<span class="breadcrumb-node">Text analysis overview</span>
</div>
<div class="navheader">
<span class="prev">
<a href="analysis.html">« Text analysis</a>
</span>
<span class="next">
<a href="analysis-concepts.html">Text analysis concepts »</a>
</span>
</div>
<div class="chapter">
<div class="titlepage"><div><div>
<h2 class="title">
<a id="analysis-overview"></a>Text analysis overview<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/overview.asciidoc">edit</a>
</h2>
</div></div></div>

<p>Text analysis enables Elasticsearch to perform full-text search, where the search returns
all <em>relevant</em> results rather than just exact matches.</p>
<p>If you search for <code class="literal">Quick fox jumps</code>, you probably want the document that
contains <code class="literal">A quick brown fox jumps over the lazy dog</code>, and you might also want
documents that contain related words like <code class="literal">fast fox</code> or <code class="literal">foxes leap</code>.</p>
<h3>
<a id="tokenization"></a>Tokenization<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/overview.asciidoc">edit</a>
</h3>
<p>Analysis makes full-text search possible through <em>tokenization</em>: breaking a text
down into smaller chunks, called <em>tokens</em>. In most cases, these tokens are
individual words.</p>
<p>If you index the phrase <code class="literal">the quick brown fox jumps</code> as a single string and the
user searches for <code class="literal">quick fox</code>, it isn’t considered a match. However, if you
tokenize the phrase and index each word separately, the terms in the query
string can be looked up individually. This means they can be matched by searches
for <code class="literal">quick fox</code>, <code class="literal">fox brown</code>, or other variations.</p>
<h3>
<a id="normalization"></a>Normalization<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/overview.asciidoc">edit</a>
</h3>
<p>Tokenization enables matching on individual terms, but each token is still
matched literally. This means:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
A search for <code class="literal">Quick</code> would not match <code class="literal">quick</code>, even though you likely want
either term to match the other
</li>
<li class="listitem">
Although <code class="literal">fox</code> and <code class="literal">foxes</code> share the same root word, a search for <code class="literal">foxes</code>
would not match <code class="literal">fox</code> or vice versa.
</li>
<li class="listitem">
A search for <code class="literal">jumps</code> would not match <code class="literal">leaps</code>. While they don’t share a root
word, they are synonyms and have a similar meaning.
</li>
</ul>
</div>
<p>To solve these problems, text analysis can <em>normalize</em> these tokens into a
standard format. This allows you to match tokens that are not exactly the same
as the search terms, but similar enough to still be relevant. For example:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<code class="literal">Quick</code> can be lowercased: <code class="literal">quick</code>.
</li>
<li class="listitem">
<code class="literal">foxes</code> can be <em>stemmed</em>, or reduced to its root word: <code class="literal">fox</code>.
</li>
<li class="listitem">
<code class="literal">jump</code> and <code class="literal">leap</code> are synonyms and can be indexed as a single word: <code class="literal">jump</code>.
</li>
</ul>
</div>
<p>To ensure search terms match these words as intended, you can apply the same
tokenization and normalization rules to the query string. For example, a search
for <code class="literal">Foxes leap</code> can be normalized to a search for <code class="literal">fox jump</code>.</p>
<h3>
<a id="analysis-customization"></a>Customize text analysis<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/overview.asciidoc">edit</a>
</h3>
<p>Text analysis is performed by an <a class="xref" href="analyzer-anatomy.html" title="Anatomy of an analyzer"><em>analyzer</em></a>, a set of rules
that govern the entire process.</p>
<p>Elasticsearch includes a default analyzer, called the
<a class="xref" href="analysis-standard-analyzer.html" title="Standard Analyzer">standard analyzer</a>, which works well for most use
cases right out of the box.</p>
<p>If you want to tailor your search experience, you can choose a different
<a class="xref" href="analysis-analyzers.html" title="Built-in analyzer reference">built-in analyzer</a> or even
<a class="xref" href="analysis-custom-analyzer.html" title="Create a custom analyzer">configure a custom one</a>. A custom analyzer gives you
control over each step of the analysis process, including:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
Changes to the text <em>before</em> tokenization
</li>
<li class="listitem">
How text is converted to tokens
</li>
<li class="listitem">
Normalization changes made to tokens before indexing or search
</li>
</ul>
</div>
</div>
<div class="navfooter">
<span class="prev">
<a href="analysis.html">« Text analysis</a>
</span>
<span class="next">
<a href="analysis-concepts.html">Text analysis concepts »</a>
</span>
</div>
</div>

                  <!-- end body -->
                        </div>
                        <div class="col-xs-12 col-sm-4 col-md-4" id="right_col">
                        
                        </div>
                    </div>
                </div>
            </section>
        </div>
    </section>
</div>
<script src="../static/cn.js"></script>
</body>
</html>